Skip to content

backup: veeam kvm integration#12991

Draft
shwstppr wants to merge 146 commits intoapache:mainfrom
shapeblue:integration-veeam-kvm
Draft

backup: veeam kvm integration#12991
shwstppr wants to merge 146 commits intoapache:mainfrom
shapeblue:integration-veeam-kvm

Conversation

@shwstppr
Copy link
Copy Markdown
Contributor

@shwstppr shwstppr commented Apr 9, 2026

Description

Design spec: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421954133

This PR introduces the initial implementation of Veeam integration support for KVM in CloudStack by adding a UHAPI-compatible server and image server components.

Veeam Backup & Replication interacts with virtualization platforms using its Universal Hypervisor API (UHAPI). To enable backup and restore workflows for CloudStack-managed KVM environments, this change introduces a UHAPI server that exposes CloudStack resources through a UHAPI-compatible interface.

In addition to the control plane APIs, an image server component is introduced to handle the data transfer operations required during backup and restore workflows.

Architecture

The integration consists of two main components:

  1. UHAPI Server (Control Plane) named CloudStack Veeam Control Service

A lightweight UHAPI server runs inside the CloudStack management server and exposes endpoints under:

/ovirt-engine
    - /api - For APIs
    - /sso - For authentication
    - /services/pki-resource - For certificates

This server provides inventory discovery APIs required by Veeam and translates CloudStack resources into the structures expected by UHAPI.

The server:

  • exposes infrastructure inventory
  • handles authentication and session tokens
  • maps CloudStack resources to UHAPI-compatible representations
  1. Image Server (Data Plane) named CloudStack Image Service

A separate image server component is introduced to handle backup and restore data transfer operations.

This component:

  • serves disk image data during backup
  • receives image data during restore operations
  • exposes endpoints used by Veeam worker components
  • integrates with CloudStack storage to read and write VM disk data

The separation between both these components server ensures that:

  • metadata APIs and control operations remain lightweight
  • bulk image transfer operations are handled independently

Documentation PR: apache/cloudstack-documentation#642

Co-authored by @abh1sar @weizhouapache

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

shwstppr and others added 30 commits April 9, 2026 09:38
Signed-off-by: Abhishek Kumar <[email protected]>
Signed-off-by: Abhishek Kumar <[email protected]>
Signed-off-by: Abhishek Kumar <[email protected]>
Signed-off-by: Abhishek Kumar <[email protected]>
Signed-off-by: Abhishek Kumar <[email protected]>
Signed-off-by: Abhishek Kumar <[email protected]>
Signed-off-by: Abhishek Kumar <[email protected]>
Signed-off-by: Abhishek Kumar <[email protected]>
Signed-off-by: Abhishek Kumar <[email protected]>
Signed-off-by: Abhishek Kumar <[email protected]>
@blueorangutan
Copy link
Copy Markdown

@abh1sar a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17544

return true;
}

resetService(unitName);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the service is running (checkResult == null) but the control socket is not ready, the code falls through, calls resetService, and then skips the start block because checkResult != null is false. It then waits up to 10 seconds for a socket that will never become ready because nothing restarted the service. The service is left in a broken state.

Signed-off-by: Abhishek Kumar <[email protected]>
Signed-off-by: Abhishek Kumar <[email protected]>
Signed-off-by: Abhishek Kumar <[email protected]>
@apache apache deleted a comment from blueorangutan Apr 29, 2026
@apache apache deleted a comment from blueorangutan Apr 29, 2026
@shwstppr
Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@shwstppr a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17647

Signed-off-by: Abhishek Kumar <[email protected]>
@shwstppr
Copy link
Copy Markdown
Contributor Author

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@shwstppr a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@github-actions
Copy link
Copy Markdown

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

Signed-off-by: Abhishek Kumar <[email protected]>
@blueorangutan
Copy link
Copy Markdown

[SF] Trillian test result (tid-15987)
Environment: kvm-ol8 (x2), zone: Advanced Networking with Mgmt server ol8
Total time taken: 50128 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr12991-t15987-kvm-ol8.zip
Smoke tests completed. 151 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@shwstppr
Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@shwstppr a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@shwstppr shwstppr changed the title [WIP] backup: veeam kvm integration backup: veeam kvm integration Apr 30, 2026
@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17665

# Enable TLS for image server transfers. The keys are read from:
# cert file = /etc/cloudstack/agent/cloud.crt
# key file = /etc/cloudstack/agent/cloud.key
image.server.tls.enabled=true
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shwstppr
is /etc/cloudstack/agent/cloud.ca.crt used ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abh1sar can tell better but I think yes we use cloud certificates for image server.
To the Veeam worker VM, we just pass the Root CA from the MS

Copy link
Copy Markdown
Contributor

@abh1sar abh1sar Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's right. Is it ok to show the file names?

import com.cloud.utils.EnumUtils;

@APICommand(name = "createImageTransfer",
description = "Create image transfer for a disk in backup. This API is intended for testing only and is disabled by default.",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API is intended for testing only and is disabled by default.

this sentence exists in all APIs in this folder. is this correct ? @abh1sar

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes Wei, I don't want to expose these APIs to end users. Only Veeam control service uses them.
But they are useful in testing. For example, we can run integration tests using these even without veeam.
Any better way to handle this?


@Parameter(name = ApiConstants.FORMAT,
type = CommandType.STRING,
description = "Format of the image: cow/raw. Currently only raw is supported for download. Defaults to raw if not provided")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently only raw is supported for download

I think the image is qcow2/cow format, right ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the description is confusing. the format here is not the disk format, but the image transfer format. I'll update the description to "Format for the image transfer: raw/cow. 'raw' will create an NBD backend. 'cow' will use the File backend. For download, only the 'raw' format is supported. Default: raw"

import org.apache.cloudstack.context.CallContext;

@APICommand(name = "finalizeImageTransfer",
description = "Finalize an image transfe. This API is intended for testing only and is disabled by default.r",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transfe -> transfer

import com.cloud.event.EventTypes;

@APICommand(name = "startBackup",
description = "Start a VM backup session. This API is intended for testing only and is disabled by default.",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startBackup is a common word.

it would be better to explain what the intention is, and what hypervisors are supported, etc.

@@ -16,6 +16,10 @@
// under the License.
package com.cloud.api;

import static com.cloud.user.AccountManagerImpl.apiKeyAccess;
import static org.apache.cloudstack.api.ApiConstants.PASSWORD_CHANGE_REQUIRED;
import static org.apache.cloudstack.user.UserPasswordResetManager.UserPasswordResetEnabled;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these from another PR ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is just due to the reordering of imports

@@ -10075,4 +10113,33 @@ private void setVncPasswordForKvmIfAvailable(Map<String, String> customParameter
vm.setVncPassword(customParameters.get(VmDetailConstants.KVM_VNC_PASSWORD));
}
}

protected boolean isBlankInstanceDefaultTemplate(VirtualMachineTemplate template) {
return KVM_VM_DUMMY_TEMPLATE_NAME.equals(template.getUniqueName());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename to KVM_BLANK_VM_TEMPLATE_NAME or so

Comment on lines +154 to +157
if (!isKVMBackupExportServiceSupported(vm.getDataCenterId())) {
throw new CloudRuntimeException("Veeam-KVM integration can not be used along with the " + BackupProviderPlugin.valueIn(vm.getDataCenterId()) +
" backup provider. Either set backup.framework.enabled to false or set the Zone level config backup.framework.provider.plugin to \"dummy\".");
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check is used in multiple locations, it would be better to extract to a new method

Comment thread tools/apidoc/gen_toc.py
'listVmCheckpoints' : 'Backup and Recovery',
'deleteVmCheckpoint' : 'Backup and Recovery',
'ImageTransfer' : 'Backup and Recovery',
'VmCheckpoint' : 'Backup and Recovery',
'UnmanagedInstance': 'Virtual Machine',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe only the last two are needed

    'ImageTransfer' : 'Backup and Recovery',
    'VmCheckpoint' : 'Backup and Recovery',

@@ -87,6 +87,7 @@ export default {
}
},
created () {
console.log('---------------', this.$route.meta.name)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove it ?

Copy link
Copy Markdown
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM
left some minor comments and questions.

great job @shwstppr @abh1sar !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants